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Abstract. This paper presents a novel technique for process discovery. 
In contrast to the current trend, which only considers an event log for 
discovering a process model, we assume two additional inputs: an in¬ 
dependence relation on the set of logged activities, and a collection of 
negative traces. After deriving an intermediate net unfolding from them, 
we perform a controlled folding giving rise to a Petri net which contains 
both the input log and all independence-equivalent traces arising from it. 
Remarkably, the derived Petri net cannot execute any trace from the neg¬ 
ative collection. The entire chain of transformations is fully automated. 
A tool has been developed and experimental results are provided that 
witness the signihcance of the contribution of this paper. 


1 Introduction 

The derivation of process models from partial observations has received signif¬ 
icant attention in the last years, as it enables eliciting evidence-based formal 
representations of the real processes running in a system [17]. This discipline, 
known as process discovery, has similar premises as in regression analysis, i.e., 
only when moderate assumptions are made on the input data one can derive 
faithful models that represent the underlying system. 

Formally, a technique for process discovery receives as input an event log, 
containing the footprints of a process’ executions, and produces a model (e.g., a 
Petri net) describing the real process. Many process discovery algorithms in the 
literature make strong implicit assumptions. A widely used one is log complete¬ 
ness, requiring every possible trace of the underlying system to be contained in 
the event log. This is hard to satisfy by systems with cyclic or infinite behavior. 

This is the unabridged version of a paper with the same title appeared at the pro¬ 
ceedings of ATVA 2015. 
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Fig. 1. Unfolding-based process discovery. 


but also for systems that evolve continuously over time. Another implicit assump¬ 
tion is the lack of noise in the log, i.e., traces denoting exceptional behavior that 
should not be contained in the derived process model. Finally, every discovery 
technique has a representational bias. For instance, the a-algorithm [16] can only 
discover Petri nets of a specific class (structured workflow nets). 

Few attempts have been made to remove the aforementioned assumptions. 
One promising direction is to relieve the discovery problem by assuming that 
more knowledge about the underlying system is available as input. On this line, 
the works in [9,11,10] are among the few that use domain knowledge in terms of 
negative information, expressed by traces which do not represent process behav¬ 
ior. In this paper we follow this direction, but additionally incorporate a crucial 
information to be used for the task of process discovery: when a pair of activities 
are independent of each other. One example could be the different tests that 
a patient should undergo in order to have a diagnosis: blood test, allergy test, 
and radiology test, which are independent each other. We believe that obtaining 
this coarse-grain independence information from a domain expert is an easy and 
natural step; however, if they are not available, one can estimate them from 
analysing the log with some of the techniques in the literature, e.g., the relations 
computed by the a-algorithm [17]. 

The approach of this paper is summarized in Fig. 1. Starting from an event 
log and an independence relation on its set of activities, we conceptually con¬ 
struct a collection of labeled partial orders whose linearizations include both the 
sequences in the log as well those in the same Mazurkiewicz trace [12], i.e., those 
obtained via successive permutations of independent activities. We then merge 
(the common prefixes of) this collection into an event structure which we next 
transform into an occurrence net representing the same behavior. Finally, we 
perform a controlled generalization by selectively folding the occurrence net into 
a Petri net. This step yields a net that (a) can execute all traces contained in 
the event log, and (b) generalizes the behavior of the log in a controlled manner, 
introducing no execution given in the collection of negative traces. The folding 
process is driven by a folding equivalence relation, which we synthesize using 
SMT. Different folding equivalences guarantee different properties about the fi¬ 
nal net. The paper proposes three different classes of equivalences and studies 
their properties. In particular we define a class of independence-preserving fold¬ 
ing equivalences, guaranteeing that the natural independence relation in the final 
net will equal the one given by the expert. 

In summary, the main contributions of the paper are: 















— A general and efficient translation from prime event structures to occurrence 
nets (Section 3). 

— Three classes of folding equivalences of interest not only in process discovery 
but also in formal verification of concurrent systems (Section 4). 

— A method to synthesize folding equivalences using SMT (Section 5). 

— An implementation of our approach and experimental results witnessing its 
capacity to even rediscover the original model (Section 6). 

Remarkably, the discovery technique of this paper solves for the first time 
one of the foreseen operations in [6] , which advocates for the unified use of event 
structures to support process mining operations. 

2 Preliminaries 

Events: given an alphabet of actions A, several occurrences of a given action can 
happen on a run or execution. In this paper we consider a set E of events rep¬ 
resenting the occurrence of actions in executions. Each event e € E has the form 
e := (a, E[), where ae A and H £ E is a. subset of events causing e (its history). 
The label of an event is given by a function X'-E ^ A defined as X{{a,H)) := a. 

Labeled partial orders (Ipos): we represent a labelled partial order by a pair 
{E,<), where < c E x E is a reflexive, antisymmetric and transitive relation on 
the set E of events. Two distinct events e,e' 6 A can be either ordered (e < e' 
or e' < e) or concurrent (e ^ e' and e' ^ e). Observe that all events are implicitly 
labelled by A. 

Petri nets: a net consists of two disjoint sets P and T representing respectively 
places and transitions together with a set F of flow arcs. The notion of state of 
the system in a net is captured by its markings. A marking is a multiset M of 
places, i.e., a map M: P ^ N. We focus on the so-called safe nets, where markings 
are sets, i.e., M{p) e {0,1} for all p € P. A Petri net (PN) is a net together 
with an initial marking and a total function that labels its transitions over an 
alphabet A of observable actions. Formally a PN is a tuple TV := (P, T, P, A, Mg) 
where (i) P ^ 0 is a set of places; (ii) T ^ 0 is a set of transitions such that 
P nT = 0; (Hi) P £ (P x P) u (P x P) is a set of flow arcs; (iv) A:P ^ A is a 
labeling mapping; and (v) Mg £ P is an initial marking. Elements of PuP are 
called the nodes of Af. For a transition t e T, we call ‘t := {p \ (p, t) e F} the 
preset of t, and t' := {p | (t,p) e P} the postset of t. In figures, we represent 
as usual places by empty circles, transitions by squares, P by arrows, and the 
marking of a place p by black tokens in p. A transition t is enabled in marking 
M, written M iff *f £ M. This enabled transition can fire, resulting in a 
new marking M' := {M\‘t) u t‘. This firing relation is denoted by M M'. A 
marking M is reachable from Mg if there exists a firing sequence, i.e. transitions, 
ti,... ,tn such that Mg -^ ... ^ M. The set of reachable markings from Mg 
is denoted by reach{Af). The set of co-enabled transitions is coe{N) ■= {{t,t') \ 


3M 6 reach{Af)’-’t £ M a £ M}. The set of observations of a net is the 
image over A of its fireable sequences, i.e., a e obs{Af) iff Mq M and 

X{ti)... X{tn) = o. 

Occurrence nets: occurrence nets can be seen as infinite Petri nets with a spe¬ 
cial acyclic structure that highlights conflict between transitions that compete 
for resources. Places and transitions of an occurrence net are usually called 
conditions and events. Formally, let N := {P,T,F) be a net, < the transi¬ 
tive closure of F, and < the reflexive closure of <. We say that transitions ti 
and t 2 are in structural conflict, written ti #s ^ 2 , if and only if ti ^ t 2 and 
*ti n*t 2 ^ 0. Conflict is inherited along <, that is, the conflict relation ^ is given 
hy a # b o 3ta,tb eT’.ta tb /^ta < a /\ti, < b. Finally, the concurrency relation 
CO holds between nodes a^b e P uT that are neither ordered nor in conflict, i.e. 
a CO b -.(a < b) a -.(a # 6) a -.(6 < a). 

A net /3 := {B,E,F) is an occurrence net iff (i) < is a partial order; (ii) for 
all b & B, \'b\ € {0,1}; (Hi) for all x & B u E, the set [x] ■■= {y e E \ y < x} is 
finite; (iv) there is no self-conflict, i.e. there is no x e B u E such that x x. 
The initial marking Mg of an occurrence net is the set of conditions with an 
empty preset, i.e. V& € € Mg o ‘b = 0. Every <-closed and conflict-free set 

of events C is called a configuration and generates a reachable marking defined 
as Mark{C) := (Mg u C") n ‘C. We also assume a labeling function X:E ^ A 
from events in /? to alphabet A. Conditions are of the form (e,X) where e e E 
is the event generating the condition and X ^ E are the events consuming 
it. Occurrence nets are the mathematical form of the partial order unfolding 
semantics of a Petri net [7]; we use indifferently the terms occurrence net and 
unfolding. 

Lemma 1. Let /3 be an occurrence net such that (e,e') € coe{P), then for every 
b €’e and b' €’e' we have b = b' or b co b'. 

Proof. Routine. 

Conditions in an occurrence net can be removed by keeping the causal depen¬ 
dencies and introducing a conflict relation; the obtained object is an event struc¬ 
ture [13]. 

Event structures: an event structure is a tuple £ := {E,<,ff) where if is a set 
of events; < £ if x if is a partial order (called causality) satisfying the property 
of finite causes, i.e. Ve € if : |[e]| < oo where [e] := {e' 6 if | e' < e}; # £ if x if 
is an irreflexive symmetric relation (called conflict) satisfying the property of 
conflict heredity, i.e. Ve,e',e" € if : e # e' a e' < e" => e # e". Note that in 
most cases one only needs to consider reduced versions of relations < and #, 
which we will denote < and ffd-, respectively. Formally, < (which we call direct 
causality) is the transitive reduction of <, and ffd (direct conflict) is the smallest 
relation inducing ff through the property of conflict heredity. A configuration is 
a computation state represented by a set of events that have occurred; if an event 
is present in a configuration, then so must all the events on which it causally 


depends. Moreover, a configuration does not contain conflicting events. Formally, 
a configuration of (if,<,#) is a set C £ if such that e € C ^ (Ve' < e : e' € C), 
and (e € (7 A e # e') ^ e' ^ C. The set of configurations of £ is denoted by n{£). 

Mazurkiewicz traces: let A be a finite alphabet of letters and O £ ^ x ^ a 
symmetric and irreflexive relation called independence. The relation O induces 
an equivalence relation over A*. Two words a and a' are equivalent (a a') 
if there exists a sequence cti ... cr^ of words such that cr = tri , cr' = tr^ and for all 
1 < f < fc there exists words cr', a” and letters ai, bi satisfying 

(Ti = alaibia”, CTi+i = cr'^jOiCr", and (a*, 6^)6 0 

Thus, two words are equivalent by e^ if one can be obtained from the other by 
successive commutation of neighboring independent letters. For a word a e A* 
the equivalence class of tr under e^ is called a Mazurkiewicz trace [12]. 

We now describe the problem tackled in this paper, one of the main challenges 
in the process mining field [17]. 

Process Discovery: a log £ is a finite set of traces over an alphabet A representing 
the footprints of the real process executions of a system S that is only (partially) 
visible through these runs. Process discovery techniques aim at extracting from 
a log C a process model M (e.g., a Petri net) with the goal to elicit the process 
underlying in S. By relating the behaviors of C, obs(A4) and S, particular 
concepts can be defined [5]. A log is incomplete if S\C + 0. A model M fits log 
£ if £ £ obs{M). A model is precise in describing a log £ if obs{M)\C is small. 
A model A4 represents a generalization of log £ with respect to system S if some 
behavior in S\C exists in obs(M). Finally, a model M is simple when it has 
the minimal complexity in representing obs{A4), i.e., the well-known Occam’s 
razor principle. It is widely acknowledged that the size of a process model is the 
most important simplicity indicator. Let lA^ be the universe of nets, we define a 
function c : lA^ ->• N to measure the simplicity of a net by counting the number 
of some of its elements, e.g., its transitions and/or places. 

3 Independence-Preserving Discovery 

Let 5 be a system whose set of actions is A. Given two actions a,b e A and one 
state s of S, we say that a and b commute at s when 

— if a can fire at s and its execution reaches state s', then b is possible at s iff 
it is possible at s'; and 

— if both a and b can fire at s, then firing ab and ba reaches the same state. 

Commutativity of actions at states identifies an equivalence relation in the set 
of executions of the system 5; it is a ternary relation, relating two transitions 
with one state. 


Since asking the expert to provide the commutativity relation of S would 
be difficult, we restrict ourselves to unconditional independence, i.e., a conser¬ 
vative overapproximation of the commutativity relation that is a sole property 
of transitions, as opposed to transitions and states. An unconditional indepen¬ 
dence relation of S is any binary, symmetric, and irreflexive relation O £ A x A 
satisfying that if a O 6 then a and b commute at every reachable state of 5. If 
a, b are not independent according to O, then they are dependent, denoted by 
a^b. 

In this section, given a log £ £ A*, representing some behaviors of S, and an 
arbitrary unconditional independence O of S, provided by the expert, we con¬ 
struct an occurrence net whose executions contain C together with all sequences 
in A* which are s^-equivalent to some sequence in C. 

If commuting actions are not declared independent by the expert (i.e., O is 
smaller than it could be), then M. will be more sequential than S] if some 
actions that did not commute are marked as independent, then Ai will not 
be a truthful representation of S. The use of expert knowledge in terms of an 
independence relation is a novel feature not considered before in the context of 
process discovery. We believe this is a powerful way to fight with the problem 
of log incompleteness in a practical way since it is only needed to observe in the 
log one trace representative of a class in to include the whole set of traces of 
the class in the process model’s executions. 

Our final goal is to generate a Petri net that represents the behavior of the 
underlying system. We start by translating C into a collection of partial orders 
whose shape depends on the specific definition of O. 

Definition 1. Given a sequence a e A* and an independence relation O £ Ax A, 
we associate to a a labeled partial order lpo<y{a) inductively defined by: 

1. If a = s, then let 1 := (r,0) and set lpo<y{a) := ({l},0). 

2. If a = a'a, then let lpo^{(7') := {E',<') and let e := {a,H) be the single event 
such that H is the unique ^-minimal, causally-closed set of events in E' 
satisfying that for any event e' e E', if A(e') a, then e' e H. Then set 
lpo<y{a) := (E, <) with E := E' u {e} and <’■=<' u (H x {e}). 

Since a system rarely generates a single observation, we need a compact 
way to model all the possible observations of the system. We represent all the 
partially ordered executions of a system with an event structure. 

Definition 2. Given a set of partial orders S := {{Ei,<i) | 1 < f < n}, we define 
ES(S) := (A,<,#) where: 

1. E ’■= U Ei, 

l<i<n 

2. <■■= { U <i)*, and 

l<i<n 

3. for e := {a,H) and e' := {b,H'), we have that e fj^d a' (read: e and e! are in 
direct conflict) iff e' ( H,e ( H' and a^b. The conflict relation is the 
smallest relation that includes (fa and is inherited w.r.t. <, i.e., for e e' 
and e< f, e' < f, one has f # f. 



Lemma 2. ES{S) := from Definition 2 is an event structure. 

Proof. Clearly, < is reflexive, antisymmetric and transitive by the Kleene closure: 
violations to these properties on < will contradict the corresponding properties 
in some Ipo in S since every event is characterized by the set of their causal 
events. Now, the definition of e ffd is clearly symmetric in the roles of e and 
e' since <S> is a symmetric relation; symmetry is also inherited under <. 

Lemma 3. ES{S) := {E,<,ff) from Definition 2 is unique. 

Proof. The set E and the relation < are clearly unique since they are defined 
from the union and Kleene closure operators, respectively, which derive unique 
results. Now, ffd is unique since its definition in Definition 2 is based on removing 
all the causality from <S>, and hence only one possible relation is obtained for ffd- 
By taking the smallest relation including ffd that is inherited with respect <, 
again only one possible relation is obtained for ff. 

Given a set of finite partial orders S, we now show that S is included in the 
configurations of the event structure obtained by Definition 2. This means that 
our event structure is a fitting representation of L. 

Proposition 1. If S is finite, then S £ f2{ES{S)). 

Proof. By Definition 2.1, all the events of the partial orders are part of the event 
structure. Let {Ei, <i) € S, clearly Ei is casually closed in ES{S) (Definition 2.2). 
By Definition 2.3, any event in conflict with some event e € Ei is not in its past; 
we can conclude that Ei is conflict free and therefore a configuration. □ 

Since we want to produce a Petri net, we now need to “attac/i conditions’’’’ to the 
result of Definition 2. Event structures and occurrence nets are conceptually very 
similar objects so this might seem very easy for the acquainted reader. However, 
this definition is crucial for the success of the subsequent folding step (Section 4), 
as we will be constrained to merge conditions in the preset and postset of an 
event when we merge the event. As a result, the conditions that we produce now 
should constraint as little as possible the future folding step. 

Definition 3. Given an event structure £ ■= {E,<,ff) we construct the occur¬ 
rence net P := {B,E\{l},F) in two steps 

1. Let G := {V,A) he a graph where 'V := E and ( 61 , 62 ) e A iff ei ffd 62 - 
For each clique (maximal complete subgraph) K := { 61 ,..., 6 „} of G, let 
Ck ■= [ 61 ] n ••• n [ 6 „] and ck 6 max(C'if). We add a condition b to B and 
set b 6 Ck' and b 6 * 6 ^ for i = 1.. .n. 

2. For each e s E, let Ge ■= (14, Ae) be a graph where 14 •= W ^ E\e< e'} and 
(ei, 62 ) e Ae iff A(ei) <S> A( 62 ). For each clique Kg := { 61 ,..., e„} of Gg, we 
add a condition b to B and set & 6 e* and b 6 'e^ for i = 1.. .n. 


Definition 3.1. adds a condition for every set of pairwise direct conflicting 
events; the condition is generated by some event ex which is in the past of 
every conflicting event and consumed by all of them; by the latter the conflict 
of the event structure is preserved in the occurrence net. For each event and its 
immediate successors, Definition 3.2. adds conditions between them to preserve 
causality. To minimize the number of conditions, for the successor events having 
dependent labels only one condition is generated. This step does not introduce 
new conflicts in the occurrence net since the events have dependent labels and 
none is in the past of the other, then by Definition 2 they are also in conflict in 
the event structure. 

We note that Winskel already explained, in categorical terms, how to re¬ 
late an event structure with an occurrence net [20]. However, his definition is 
of interest only in that context, while ours focus on a practical and efficient 
translation. 

Given a log C and an independence relation O, the net obtained applying 
Definitions 1, 2 and 3, in this order, is denoted by j3c,o- Since every trace in C is 
a linearization of some of the partial orders in the set S obtained by Definition 1 
and these partial orders are included by Proposition 1 in the configurations of 
ES{S) (which are the same as the configurations in Pc,o), the obtained net is 
fitting. 

Proposition 2. Let C he a log and O an independence relation, for every a ^ C 

we have <j € ohs{/3c,o)- 

Proof. Since every trace is a linearization of some partial order obtained by 
Definition 2, by Proposition 1 every trace is a linearization of the maximal con¬ 
figurations of the event structure; since causality and conflict are preserved by 
Definition 3, their configurations coincide, the trace correspond to a sequential 
execution of the occurrence net and the result holds. 

It is worth noticing that the obtained net generalizes the behavior of the 
model, but in a controlled manner imposed by the independence relation. For 
instance, if C := {a6} and aOh, then ba e obs{f3c,o)j even if this behavior was 
not present in the log. If the expert rightly declared a and b independent (i.e., 
if they commute at all states of 5), then necessarily ba is a possible observation 
of S, even if it is not in C. The extra information provided by the expert allows 
us to generalize the discovered model in a provably sound manner, thus coping 
with the log incompleteness problem. 

Proposition 3. Let Pc,<> '•= be the unfolding obtained from the log C 

with O as the independence relation. For all pairs of events e,e' € E such that 
(e, e') € coe{f3c,o) we have *e n *e' + 0 A(e) <$> A(e'). 

Proof. 

=>) Let 6 e*en*e'; if 6 was added by Definition 3.2., the result trivially holds 
since the condition was added in the preset of the events in the clique which 
relates only events with dependent labels; if b was added by Definition 3.1., then 


it was added in the preset of the events in the clique which relates only direct 
conflicting events and then e ^ in the event structure. This means that none 
can be in the past of the other, because otherwise some event would be in self¬ 
conflict which is ruled out in event structures; now, by Definition 2 we have 
A(e) <S> A(e'). 

<=') Let A(e) <S> A(e'), events e and e' could not be generated from the same 
log since if not they would be causally related (see Definition 1) contradicting 
(e,e') 6 coe(/?£_o); since they were generated from different logs and have de¬ 
pendent labels, we have from Definition 2 that e e'; since they are in conflict. 
Definition 3.1 adds a conditions in their presets and finally *en*e' + 0. □ 

The independence relation between labels gives rise to an arbitrary relation 
between transitions of a net (not necessarily an independence relation): 

Definition 4. Let O £ Ax A be an independenee relation, Af ■= {P, T, F, A, Mq) 
a net, and X'-T ->■ A. We define relation On £ Tx T between transitions of N as 

t On t' o X{t) O A(t'). 

In the next section we will define an approach to fold l3c,o into a Petri net 
whose natural independence relation equals O. To formalize our approach we 
first need to define such natural independence. 

Definition 5. Let N := (P, T, F) be a net. We define the natural independence 
relation Un ^ T x T on N as 

t Un t' ^’t n’t' = 0 At’ n’t' = 0 /\’tr\ t'’ = 0. 

In fact, one can prove that when N is safe, then Un is the notion of indepen¬ 
dence underlying the unfolding semantics of N. In other words, the equivalence 
classes of =Un 8’'^® bijective correspondence with the configurations in the 
unfolding of N. The following result shows that the natural independence on the 
discovered occurrence net corresponds to the relation provided by the expert, 
when both we restrict to the set of co-enabled transitions. 

Theorem 1. Let Pc,o be the occurrence net from the log C with O as the inde¬ 
pendence relation, then 

0 / 3 /:,o ^ coe{(3c,o) = ^Pc.o n coeiPc.o) 

Proof. 

£) Let (e, e') e O/S^ ^ncoe(/?£,<>), then from Definition 4 follows that A(e) OA(e') 
and by Proposition 3 we have *e n *e' = 0. Suppose e* n’e' f 0 then 36i e *e 
such that V 52 e *e' it holds that bi < 62 and by Lemma 1 (e,e') coe{fic,o) 
which leads to a contradiction. Using the same reasoning it can be proven that 
*e n e!’ = 0. By Definition 5 we can conclude that (e, e') € n/ 3 ^ ^ n coe{f5c,o)- 
3 ) Let (e,e') 6 Up^^^ n coe(/3£_<>), by Definition 5 we get *e n *e' = 0 and since 
they are co-enabled, by Proposition 3 follows A(e) O A(e'); finally by Definition 4 
we have (e, e') e 0 / 3 £<^ and since the events were co-enabled by assumption 
{e,e')eOpc.o^<^o^iPc,o)- □ 


4 Introducing Generalization 


The construction described in the previous section guarantees that the unfolding 
obtained is fitting (see Proposition 1). However, the difference between S and C 
may be significant (e.g., S can contain cyclic behavior that can be instantiated 
an arbitrary number of times whereas only finite traces exist in C) and the 
unfolding may be poor in generalization. The goal of this section is to generalize 
/3c,<y in a way that the right patterns from S, partially observed in C (e.g., loops), 
are incorporated in the generalized model. To generalize, we fold the discovered 
occurrence net. This folding is driven by an equivalence relation ~ on E u B 
that dictates which events merge into the same transition, and analogously for 
conditions; events cannot be merged with conditions. We write [x]~ := {x' | x ~ 
x') for the equivalence class of node x. For a set X, [X].. := {[x]... | x 6 X} is a 
set of equivalence classes. 

Definition 6 (Folded net [8]). Let /? := be an occurrence net and ~ 

a equivalence relation on the nodes of p. The folded Petri net (w.r.t. ~) is defined 
as f3~ := F^, Mq^) where 

P. := 16 6 5}, F. := {([x]., [y].) | (x,y) 6 F}, 

T. := {[e]. I e € 5), Mo.([ 6 ].) := \{b' 6 [ 6 ]. | * 6 ' = 0 }|. 

Notice that the initial marking of the folded net is not necessarily safe. Safe¬ 
ness of the net depends on the chosen equivalence relation (see Proposition 4). 

4.1 Language-Preserving Generalization 

Different folding equivalences guarantee different properties on the folded net. 
From now on we focus our attention on three interesting classes of folding equiv¬ 
alences. The first preserves sequential executions of /?£,<>. 

Definition 7 (Sequence-preserving folding equivalence). Let (3 be an oc¬ 
currence net; an equivalence relation ~ is called a sequence preserving (SP) fold¬ 
ing equivalence iffei ~ 62 implies A(ei) = A(e 2 ) and [*ei]~ = [* 62 ]- for all events 
61,62 € E. 

From the definition above it follows that 61 ~ 62 implies V6 6 *ei : 36' € *62 
with 6 ~ 6 '. Since for every folded net obtained from a SP folding equivalence 
only equally labeled events are merged; we define then A([e]~) := A(e). 

Example 1. Consider the log C = {abc, bd} and the independence relation 0 = 0- 
Fig. 2 shows the obtained unfolding Pc,o (left) and three of its folded nets. The 
equivalence relation ~i merges events labeled by 6 , but it does not merge their 
presets, i.e. is not a SP folding equivalence. It can be observed that bd is not 
fireable in Whenever two events are merged, their preconditions need to 

be merged to preserved sequential executions. The equivalence relation ~2 does 
not only merge events labeled by 6 , but it also sets pi ~2 P 2 and is a SP folding 
equivalence. The folded net can replay every trace in the log £, but it also 
adds new traces of the form a*,a*b,a*bc,a*bd,a*bcd and a*bdc. 



[p3]- 


P2 € [p: 



P2 € [p] 


[ps]~2 be]~^ 



pi « [p3]~. 


Fig. 2. Folding equivalences and folded nets. 


Given an unfolding, every SP folding equivalence generates a net that pre¬ 
serves its sequential executions. 

Theorem 2. Let fi he an occurrence net and ~ a SP folding equivalence, then 

every fireable sequence Mq Mn from /3 generates a fireable sequence 

[Mo], [M„]. from /3~. 

Proof. We reason inductively on the length of the fireable sequence. 

Base case: if n = 0, the results holds since an empty sequence of events from j3 
generates an empty sequence of transitions that is trivially a fireable sequence 
from 

Inductive case: we assume every fireable sequence Mq M„ from 

hll hnl 

(3 generates a Hreable sequence [Mq]. -1—... —[M„]. from /3~; we 
need to prove that the sequence Mq ^ M^+i generates a fire¬ 
able sequence [Mq]. IfiL. ... [M„+i].. Consider a fireable sequence 

ei ... Cn+i from /3, then by the inductive hypothesis, we know that the hrst n 
events generate a firing sequence [ei]. ... [e„]. from /3~ leading to the mark¬ 
ing [M„].; we need to prove that [M„]. Suppose this is not true, 

then there exists [&]. € *[e„+i]. with [&]. [M„].; the latter implies b f bn 

for every bn e M„. Definition 6 does not add a flow arrow from a place [6]. 
to a transition [e]. unless a flow arrow exists between a condition b' e [6]. 
and an event e' € [e]. in the occurrence net. Therefore [&]. e *[e„+i]. implies 
there exists hi ~ b and e'n+i ~ e„+i such that bi € "e'n+i (there exists a flow 
arrow between bi and e'n+i in /3), and from the transitivity of ~ follows that 

bi / bn for every e M„. (*) 

Since /? allows a firing sequence of length n + 1, we know M„ Mib and then 
V &2 6 *en+i : 62 6 M„. As every 62 in ‘Sn+i is also in M„, by (*) we have 
bi / 62 for all &2 6 ‘Cn+i- From bi e ‘en+i,en+i ~ e'n+i and the fact that ~ is a 
SP folding equivalence (Definition 7) follows that there exists 62 e *en+i such 
that bi ~ & 2 ) but we showed this is not possible; therefore our assumption was 











false and for all [5]~ € *[e„+i]~ we have [6]~ 6 Finally [M„]^ [_e„+i]~^ 

and [Mo]~ [Mn+i]~ is a fireable sequence from /3~. □ 

As a corollary of the result above and Proposition 2, the folded net obtained 
from /3c,{y with a SP folding equivalence is fitting. 

Corollary 1. Let C he a log, O an independence relation and ~ a SP folding 
equivalence, then for every a ^ C we have a € obs{j3f ^). 

Proof Since by Proposition 1 every cr e £ corresponds to a fireable sequence in 
/3c,o, the results follows immediately from Theorem 2. 

Example 2. We saw in Example 1 that every trace from C can be replayed in 
/3ff^, but (as expected) the net accepts more traces. However this net also adds 
some independence between actions of the system: after firing b the net puts 
tokens at [psj-a and [p4]~2 and the reached marking enables concurrently actions 
c and d which contradicts c <S> d (the independence relation 0 = 0 implies c^d). 
In order to avoid this extra independence, we now consider the following class 
of equivalences. 

Definition 8 (Independence-preserving folding equivalence). Let (3 be 

an occurrence net and O an independence relation; an equivalence relation ~ is 
called an independence preserving (IP) folding equivalence iff 

1. is a SP folding equivalence, 

2. A(ei) O A(e 2 ) [*ei].. n [* 62 ]- = 0 a [’ei]. n [e 2 *]~ = 0 a [e*]. n [* 62 ]- = 0 

for all events 61,62 e E. 

3. bi CO &2 implies bi / 62 for all conditions 61,62 s B. 

IP folding equivalences not only preserve the sequential behavior of /3, but 
also ensure that /3~ and /3 exhibit the same natural independence relation. 

The definition above differs from the folding equivalence definition given in 
[ 8 ]; they consider occurrence nets coming from an unfolding procedure which 
takes as an input a net. This procedure generates a mapping between conditions 
and events of the generated occurrence net and places and transitions in the 
original net. Such mapping is necessary to define their folding equivalence. In 
our setting, the occurrence net does not come from a given net and therefore the 
mapping is not available. 

Example 3. The equivalence ~2 from Fig. 2 is not an IP folding equivalence 
since the intersection of the equivalent classes of the preset of c and d is empty 
([*c]~2 = {[P4]~2},[*rf]~2 = {[Psl-al and ^ {[paj-s} = 0 ), but c and d 

are not independent. Consider the equivalence relation -3 which merges events 
labeled by 6 and it sets pi >-3 p 2 and pa ~3 P4; this relation is an IP folding 
equivalence. It can be observed in the net /32^o of Fig. 2 that all the traces from 
the log can be replayed, but new independence relations are not introduced. 





The occurrence net l3c,o i® clearly safe. We show that ^ is also safe when - 
is an IP folding equivalence. In this work, we constraint IP equivalences to gen¬ 
erate safe nets because their natural independence relation is well understood 
(Definition 5), thus allowing us to assign a solid meaning to the class IP. It is 
unclear what is the natural unconditional independence of an unsafe net, and 
extending our definitions to such nets is subject of future work. 

Proposition 4. Let f3c,o be the unfolding obtained from the log C with O as 
the independence relation and ~ an IP folding equivalence. Then (If ^ is safe. 

Proof. The unfolding Pc.o i® trivially safe since its initial marking puts one 
token in its minimal conditions and each condition contains only one event in its 
preset and that event cannot put more than one token in the condition. Suppose 
/32 ^ is not safe, by the above this is possible iff there exists C 6 reach{j5c,<>) 
and 6 i, 62 € C such that 61 ~ &2 ■ If and 62 belong to a reachable marking, then 
they must be concurrent and since ~ is an IP folding equivalence they cannot be 
merged, which leads to a contradiction. Finally ^ must be safe. □ 

Theorem 1 shows that the structural relation between events of the unfolding 
and the relation generated by the independence given by the expert coincide 
(when we restrict to co-enabled events); the result also holds for the folded net 
when an IP folding equivalence is used. 

Theorem 3. Let fdc.o be the unfolding obtained from the log C with O as the 
independence relation and ~ an IP folding equivalence, then ^ ^. 

Proof. Let {t,t') € 0/3~ from Definition 4 this is true iff A(t) O A(t') which is 
true iff for all eet,e' et' we have A(e) O A(e') (since the folding equivalence pre¬ 
serves labeling). As ~ is a IP folding equivalence, independence between labeles 
holds iff for all eet,e' et' we have [‘e].., n [‘e'].., = 0 (see Definition 8.2). Using 
Definition 6, the presets of t and t' are generated by some of the conditions in 
the preset of each e and e' respectively (the folding procedure does not intro¬ 
duces flow arrows) and we showed above that those conditions generate places 
that do not intersect those places generated by conditions in the preset of every 
e'; thus [*e]~ n [‘e']... = 0 iff ‘[e]... n*[e']... = 0 iff n *t' = 0 from t = [e]~ and 
t' = [e']~,. Using the same reasoning it can be shown that independence between 
labels holds iff ‘t n t'* = 0 and t’ n *t' = 0. Finally from Definition 5 we get 
'tn’t' = 0 A’tnt'’ = 0 At’ n ’t' = 0 iff {t,t') o 


4.2 Controlling Generalization via Negative Information 

We have shown that IP folding equivalences preserve independence. However, 
they could still introduce new unintended behaviour not present in S. In this 
section we limit this phenomena by considering negative information, denoted by 
traces that should not be allowed by the model. Concretely, we consider negative 
information which is also given in the form of sequences u e Cr £ A*. Negative 


information is often provided by an expert, but it can also be obtained automat¬ 
ically by recent methods [19]. Very few techniques in the literature use negative 
information in process discovery [10]. In this work, we assume a minimality cri¬ 
terion on the negative traces used: 

Assumption 1 Let C := be a pair of positive and negative logs and O the 

independence relation given by the expert. Any negative trace a € C~ corresponds 
to the local configuration of some event in fdc,o- 

This assumption implies that each negative trace is of the form a'a where a' 
only contains the actions that are necessarily to fire a. If a can happen without 
them, they should not be consider part of a. By removing all events from 
Pc.o (one for each negative trace a 6 C~), we obtain a new occurrence net 
denoted by /?£,<>,*. The goal of this section is to fold this occurrence net without 
re-introducing the negative traces in the generalization step. If the expert is 
unable to provide negative traces satisfying this assumption, the discovery tool 
can always let him/her choose Co- from a visual representation of the unfolding. 

Definition 9 (Removal-aware folding equivalence). Let (3 := {B,E,F) be 
an occurrence net and CA a negative log; an equivalence relation ~ is called 
removal aware (RA) folding equivalence iff 

1. is a SP folding equivalence, and 

2. for every a e and e' e E we have A(e') = A(eo-) implies [‘e']... $ [*ecr]. 

The folded net obtained from fc.o,* ''^ith a RA folding equivalence does not 
contain any of the negative traces. 

Theorem 4. Let f3c,o,* be the unfolding obtained from the log C ■= with 

O as the independence relation after removing the corresponding event of each 
negative trace and -- a RA folding equivalence,^ then 

Proof. Let a := a'a and suppose a e obs{j3f ^ r\L~ . Since cr e by Proposi¬ 
tion 1 (T 6 obs{Pc,<>)i it follows by construction (see Definition 2) that a generates 
a unique local configuration which is removed in /?£,<>,* (by removing Ca-). Thus 
a / obs{/3c,o,*)^ but a' € obs{f3c,o,*) since only the maximal event of the 
local configuration is removed. Let M be the marking reached in fc.o,* after 
a', we know (using Theorem 2) that a' generates a firing sequence in Pf <> * 
which leads to the reachable marking [M]..,. Since we assumed a € obs{Pf ^ ,^), 

[^a] 

there exists a transition [ 60 ]^ such that [M].., ° > with A([ea]~) = a, but this 

implies (from Definition 6 ) that the preset of eo- was merged with the preset of 
Ca which contradicts the assumption that ~ is a RA folding equivalence. Finally 
the assumption was false and obs{Pf ^ n £“ = 0 . □ 

® Since Definition 9 refers to the events that generates the local configurations of the 
negative traces, the folding equivalence must be defined over the nodes of Pc,o and 
not those of fic.o,*- 



5 Computing Folding Equivalences 


Section 3 presents a discovery algorithm that generates fitting occurrence nets 
and Section 4 defines three classes of folding criteria, SP, IP, and RA, that 
ensure various properties. This section proposes an approach to synthesize SP, IP 
and RA folding equivalences using SMT. 


5.1 SMT Encoding 

We use an SMT encoding to find folding equivalences generating a net /3~ satis¬ 
fying specific metric properties. Specifically, given a measure c (cf.. Section 2), 
decidable in polynomial time, and a number /c € N, we generate an SMT formula 
which is satisfiable iff there exists a folding equivalence such that c(/3~) = k. We 
consider the number of transitions in the folded net as the measure c, however, 
theoretically, any other measure that can be computed in polynomial time could 
be used. As explained in Section 2 simple functions like counting the number of 
nodes/arcs provide in practice reasonable results. 

Given an occurrence net /3 := {B,E,F), for every event e € E and condi¬ 
tion b € B we have integer variables Ve and Vb- The key intuition is that two 
events (conditions) whose variables have equal number are equivalent and will 
be merged into the same transition (place). The following formulas state, respec¬ 
tively, that every element of a set X is related with at least one element of a set 
Y, and that every element of X is not related with any element of Y: 

A V (W = Vy) := A i Vy) 

x^X y^Y x^X,y^Y 


We force any satisfying assignment to represent an SP folding equivalence 
(Definition 7) with the following two constraints: 



Formulas and impose that only equally labeled events should be equiv¬ 
alent and that if two events are equivalent, then their presets should generate 
the same equivalence class: 




A {VeiVe') 

e,e 

A(e)jtA(e') 


:= A {"Ve = Vf, 

e,e'^E 


(<(..AA,AAT.e)) 


ub 


In addition to the properties encoded above, an IP folding equivalence (Def¬ 
inition 8) should satisfy some other restrictions: 



where imposes that the presets and postsets of events with independent 
labels should generate equivalence classes that do not intersect and forbids 
concurrent conditions to be merged: 



A(A(e)OA(e') 

e,e'^E 




A 


)) 


'/>“•= Ai^b^Vb') 

b,b'^B 

b CO b' 


Given a negative log C , to encode a RA folding equivalence (Definition 9) 
we define: 

a^C ,e'^E 
\(e')=\{e„) 

where the right part of the conjunction imposes that for every €„ generated by 
a negative trace and any other event with the same label, their presets cannot 
generate the same equivalence class. 

We now encode the optimality (w.r.t. the number of transitions) of the mined 
net. Given an occurrence net /? := each event e€ E generates a tran¬ 

sition Ve in the folded net /3~. To impose that the number of transitions in /3~ 
should be at most fc 6 N, we define: 

Ml <Ve<k) 

e^E 


To find an IP and RA folding equivalence that generates a net with at most 
k transitions we propose the following encoding: 


lOPT 

Vp,C-,k 


lIP 


iRA 


- A 


,MET 


Theorem 5. Let C := & C~ be a set of positive and negative logs, O £ A x A 

and independence relation and k eN. The formula is satisfiable iff there 

exists an IP and RA folding equivalence ~ such that o * contains at most k 
transitions. 

Proof Let if he a, solution of j, and let be the relation such that x x' 
iff "0 1= {vx = Vx'), i.e. if assigns the same value to Vx and Vx'- By the reflexivity, 
symmetry and transitivity of integer numbers follows that is an equivalence 
relation. The assignment "0 is a solution of the formula iff all of the following are 
true: 

1. holds; this is true iS (i) for every two events e, e' with different labels 

Ve f Ve', (ii) if Ve = Ve' then for all 6 6 *e there exists b' e’e' such that Vb = Vb' 
and viceversa, (Hi) for every pair e, e' of events with independent labels (Hi. a) 
for all conditions 6 € *e, &' € *e' we have Vb f Vb', (Hi.b) for all conditions 
b € *e, b' 6 e'* we have Vb f Vb', (Hi.c) for all conditions b e e‘,b' € *e' we have 
Vb f Vb', (iv) for every pair b,b' of concurrent conditions we have Vb / Vb'; 
by the definition of ~.ij, we have (i) for every two events e,e' with different 
labels e e', (H) if e e' then (Hi) for every pair e,e' of 

events with independent labels n = 0, n = 0 and 

= 0; (iv) b CO b' implies b / 6'; by Definition 8 this is true iff 
the relation is an IR folding equivalence. 

2. (j)^^ jr- holds; this is true \S- (i) for every two events events e,e' with dif¬ 
ferent labels Ve f Ve', (H) if Ve = Ve' then for all 6 e *e there exists b' e'e' 
such that Vb = Vb' and viceversa, (Hi) for any trace a e C~ and any event 
e' e E with the same label as e^y there exists a conditions b € *e' such that 


for any condition b' we have ^ Vb/] by the definition of we have 

(i) for every two events e,e' with different labels e e', (ii) e' then 

[*e]~^ = and (in) [*e']~^ $ [*eo-]~^ for any negative trace a and event 

e' with the same label as Ca', by Definition 9 this is true iff is a RA folding 
equivalence. 

3. holds, this is true iff Ue < fc for every event e e E; the encoding as¬ 

sociates a number to each equivalence class (according to of events and 
bounds the number of equivalence classes by k, since the number of transi¬ 
tions in * corresponds to the number of equivalent classes of events (see 
Definition 6), this is true iff the number of transitions of bounded 

by k. □ 

5.2 Finding an Optimal Folding Equivalence 

Section 5.1 explains how to compute a folding equivalence that generates a folded 
net with a bounded number of transitions; this section explain how to obtain 
the optimal folded net, i.e the one with minimal number of transitions satisfying 
the properties of Theorem 3 and Theorem 4. 

Iterative calls to the SMT solver can be done for a binary search with k 
between mink and maxk] since only equally labeled events can be merged by 
the folding equivalence, the minimal number of transitions in the folded net is 
mink •= |A|; in the worst case, when events cannot be merged, maxk •= \E\. 

As a side remark, we have noted that the optimal folding equivalence can be 
encoded as a MaxSMT problem [14] where some clauses which are called hard 
must be true in a solution (in our case 4)^^^ and £-) and some soft clauses may 
not {4'^k'^ for |A| < k < |E|); a MaxSMT solver maximizes the number of soft 
clauses that are satisfiable and thus it obtains the minimal k generating thus the 
optimal folded net. 

6 Experiments 

As a proof of concept, we implemented our approach into a new tool called 
Pod (Partial Order Discovery).® It supports synthesis of SP and IP folding 
equivalences using a restricted form of our SMT encoding. In particular Pod 
merges all events with equal label, in contrast to the encoding in Section 5 
which may in general yield more than one transition per log action. While this 
ensures a minimum (optimal as per Section 5.2) number of folded transitions, 
the tool could sometimes not find a suitable equivalence (unsatisfiable SMT 
encoding). Since the number of transitions in the folded net is fixed, it turns out 
that the quality of the mined model increases as we increase the number of folded 
places, as we show below. Using Pod we evaluate the ability of our approach to 
rediscover the original process model, given its independence relation and a set 
of logs. For this we have used standard benchmarks from the verification and 
process mining literature [15,18]. 


Tool and benchmarks: http://lipn.univ-parisl3.fr/~rodriguez/exp/atval5/. 



Original POD (max. places) Pod (60% places) 


Benchmark 

|r| 


rssM 

tmqs %Prec. 


rssM 

tmqs %Prec. 


A(22) 

22 

20 

0 , 

.99 

1 , 

.00 

0 , 

.77 

19 

0 , 

.57 

1 , 

.00 

0 , 

,22 

11 

A(32) 

32 

32 

1 , 

.00 

1 , 

.00 

0 , 

.80 

32 

0 , 

.46 

1 

.00 

0 , 

, 19 

19 

A(42) 

42 

47 

0 , 

.98 

1 . 

.00 

0 , 

.54 

40 

0 , 

.79 

1 

.00 

0 , 

,21 

28 

T(32) 

33 

31 

1 , 

.00 

1 , 

.00 

0 , 

.88 

31 

0 , 

.54 

1 

.00 

0 , 

, 19 

18 

Angio ( I ) 

64 

39 

0 , 

.39 

0 , 

.94 

0 , 

. 18 

21 

0 . 

.10 

0 

.92 

0 , 

,06 

13 

Complex 

19 

13 

0 , 

.98 

1 , 

.00 

0 , 

.62 

12 

0 , 

.62 

1 

.00 

0 , 

,39 

7 

ConfDimB 

11 

10 

1 , 

.00 

1 , 

.00 

1 , 

.00 

10 

0 , 

.62 

1 

.00 

0 , 

,39 

6 

Cycles ( 5 ) 

20 

16 

1 , 

.00 

1 , 

.00 

1 , 

.00 

16 

0 , 

.60 

1 , 

.00 

0 , 

,40 

6 

DbMut ( 2 ) 

32 

38 

0 , 

00 

0 , 

00 

0 , 

.94 

32 

0 , 

.76 

0 

oo 

0 , 

,21 

19 

Dc 

32 

35 

0 , 

.99 

0 , 

.99 

0 , 

.77 

27 

0 , 

.84 

0 

.99 

0 , 

,38 

21 

Peters ( 2 ) 

126 

102 

0 , 

.45 

1 , 

.00 

0 , 

.07 

51 

0 , 

.30 

1 

.00 

0 , 

,05 

30 


Table 1. Experimental resnlts. 


In our experiments, Table 1, we consider a set of original processes faith¬ 
fully modelled as safe Petri nets. For every model S we consider a log £, i.e. 
a subset of its traces. We extract from S the (best) independence relation 
that an expert could provide. We then provide C and to Pod and find an 
SP folding equivalence with the largest number of places (cols. “max. places”) 
and with 60% of the places of S (last group of cols.), giving rise to two different 
mined models. All three models, original plus mined ones, have perfect fitness 
but varying levels of precision, i.e. traces of the model not present in the log. For 
the mined models, we report (cols. “%Prec.”) on the ratio between their pre¬ 
cision and the precision of the original model S. All precisions were estimated 
using the technique from [2]. All Pod running times were below 10s. 

Additionally, we measure how much independence of the original model is 
preserved in the mined ones. For that, we define the ratios rs^M ■= 
and r^c 5 := The closer is to 1, the larger is the number of 

pairs in also contained in tijn (i.e., the more independence was preserved), 
and conversely for tmzS (the less independence was ^Hnvented”). Remark that 
iff = fMsS = 1- 

In 7 out of the 11 benchmarks in Table 1 our proof-of-concept tool rediscovers 
the original model or finds one with only minor differences. This is even more 
encouraging when considering that we only asked Pod to find SP equivalences 
which, unlike IP, do not guarantee preservation of independence. In 9 out of 11 
cases both ratios rsoM and r^sS are above 98%, witnessing that independence is 
almost entirely preserved. Concerning the precision, we observe that it is mostly 
preserved for these 9 models. We observe a clear correlation between the number 
of discovered places and the precision of the resulting model. The running times 
of Pod on all benchmarks in Table 1 were under few seconds. 

In Peters(2) and Angio(I) our tool could not increase the number of places 
in the folded net, resulting in a significant loss of independence and precision. We 







tracked the reason down to (a) the additional restrictions on the SMT encoding 
imposed by our implementation and (b) the algorithm for transforming event 
structures into unfoldings (i.e., introducing conditions). We plan to address this 
in future work. This also prevented us from of employing IP equivalences instead 
of SP for these experiments: Pod could find IP equivalences for only 5 out of II 
cases. Nonetheless, as we said before, in 9 out of 11 the found SP equivalences 
preserved at least 98% of the independence. 

Finally, we instructed Pod to synthesize SP equivalences folding into an 
arbitrarily chosen low number of places (60% of the original). Here we observe 
a large reduction of precision and significant loss of independence (surprisingly 
only rgoM drops, but not tmss)- This witnesses a strong dependence between 
the number of discovered places and the ability of our technique to preserve 
independence. 


7 Related Work 


To the best of our knowledge, there is no technique in the literature that solves 
the particular problem we are considering in this paper: given a set of positive and 
negative traces and an independence relation on events, derive a Petri net that 
both preserves the independence relation and satisfies the quality dimensions 
enumerated in Section 2. However, there is related work that intersects partially 
with the techniques of this paper. We now report on it. 

Perhaps the closest work is [8], where the simplification of an initial process 
model is done by first unfolding the model (to derive an overfitting model) 
and then folding it back in a controlled manner, thus generalizing part of the 
behavior. The approach can only be applied for fitting models, which hampers its 
applicability unless alignment techniques [1] are used. The folding equivalences 
presented in this paper do not consider a model and therefore are less restrictive 
than the ones presented in [8]. 

Synthesis is a problem different from discovery: in synthesis, the underlying 
system is given and therefore one can assume S = C. Considering a synthesis 
scenario, Bergenthum et al. have investigated the synthesis of a p/t net from 
partial orders [3]. The class of nets considered in this paper (safe Petri nets) 
is less expressive than p/t nets, which in practice poses no problems in the 
context of business processes. The algorithms in [3] are grounded in the theory 
of regions and split the problem into two steps (i) the p/t net A4 is generated 
which, by construction, satisfies C £ obs(M), and (ii) it is checked whereas 
C = obs{A4). Actually, by avoiding (ii), a discovery scenario is obtained where 
the generalization feature is not controlled, in contrast to the technique of this 
paper. With the same goal but relying on ad-hoc operators tailored to compose 
Ipos (choice, sequentialization, parallel compositions and repetition), a discovery 
technique is presented in [4] . Since the operators may in practice introduce wrong 
generalizations, a domain expert is consulted for the legality of every extra run. 


8 Conclusions 


A fresh look at process discovery is presented in this paper, which establishes the¬ 
oretical basis for coping with some of the challenges in the field. By automating 
the folding of the unfolding that covers traces in the log but also combinations 
thereof derived from the input independence relation, problems like log incom¬ 
pleteness and noise may be alleviated. The approach has been implemented and 
the initial results show the potential of the technique in rediscovering a model, 
even for the simplest of the folding equivalences described in this paper. 

Next steps will focus on implementing the remaining folding equivalences, 
and in general improving the SMT constraints for computing folding equiva¬ 
lences. Also, incorporating the notion of trace frequency in the approach will be 
considered, to guide the technique to focus on principal behavior. This will allow 
to also test the tool in presence of incomplete or noisy logs. 
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